Standalone Install

1. Setup Tutorial Assumptions

We assume that a server running CentOS 7/CentOS 8 or RHEL 7/RHEL 8 is set up and ready to install Collibra DQ in the home directory (base path: OWL_BASE) under the subdirectory owl (install path: $OWL_BASE/owl). There is no requirement for Collibra DQ to be installed in the home directory, but the Collibra DQ Full Installation script may lead to permission-denied issues during local PostgreSQL server installation if paths other than the home directory are used. If these issues occur, please adjust your directory permission to allow the installation script a write access to the PostgreSQL data folder.

Note Collibra DQ is supported to run on RHEL 8. You can use your PostgreSQL database instance.

This tutorial assumes that you are installing Collibra DQ on a brand new compute instance on Google Cloud Platform. Google Cloud SDK setup with proper user permission is assumed. This is optional, as you are free to create Full Standalone Installation setup on any cloud service provider or on-premise.

Please refer to the GOAL section for the intended outcome of each step and modify accordingly.

Note The full install package supports CentOS 7/CentOS 8 and RHEL 7/RHEL 8. If another OS is required, please follow the basic install process. You must have a PostgreSQL database installed and accessible, prior to running an RHEL 8 installation package.

Note Collibra DQ Metastore supports AWS RDS or AWS Aurora.

Copy
# Create new GCP Compute Instance named "install"
gcloud compute instances create install \
    --image=centos-7-v20210701 \
    --image-project=centos-cloud \
    --machine-type=e2-standard-4

# SSH into the instance as user "centos"
gcloud compute ssh --zone "us-central1-a" --project "gcp-example-project" "centos@full-standalone-installation"

GOAL

  1. Create a new compute instance on a cloud provider (if applicable).
  2. Access the server where Collibra DQ will be installed.

2. Download DQ Full Package

Download full package tarball using the signed link to the full package tarball provided by the Collibra DQ Team. Replace <signed-link-to-full-package> with the link provided.

Copy
### Go to the OWL_BASE (home directory of the user is most common)
### This example uses /home/owldq installing as the user owldq

cd /home/owldq 

### Download & untar
curl -o dq-full-package.tar.gz "<signed-link-to-full-package>"
tar -xvf dq-full-package.tar.gz

### Clean-up unnecessary tarball (optional)
rm dq-full-package.tar.gz

GOAL

  1. Download the full package tarball and place it in the $OWL_BASE (home directory). Download via curl or upload directly via FTP. The tarball name is assumed to be dq-full-package.tar.gz for the sake of simplicity.
  2. Untar dq-full-package.tar.gz to OWL_BASE.

3. Install Collibra DQ + PostgreSQL + Spark

First set some variables for OWL_BASE (where to install Collibra DQ. In this tutorial, you are already in the directory that you want to install), OWL_METASTORE_USER (the PostgreSQL username used by DQ Web Application to access PostgreSQL storage), and OWL_METASTORE_PASS (the PostgreSQL password used by DQ Web Application to access PostgreSQL storage).

Copy
### The base path that you want Collibra DQ installed under. Trailing spaces are not permitted.

export OWL_BASE=$(pwd)
export OWL_METASTORE_USER=postgres
# minimum complexity recommended (18 length, upper, lower, number, symbol)
# example below
export OWL_METASTORE_PASS=H55Mt5EbXh1a%\$aiX6
# The default Spark version is 3.4.1, but you can opt to install Spark 3.2.2 or 3.0.1 instead.
# Spark 3.4.1 
export SPARK_PACKAGE=spark-3.4.1-bin-hadoop3.2.tgz
# Spark 3.2.2 
export SPARK_PACKAGE=spark-3.2.2-bin-hadoop3.2.tgz

Warning The $ symbol is not a supported special character in your PostgreSQL Metastore password.

dq-package-full.tar.gz that you untarred contains installation packages for Java 9 or Java 11, PostgreSQL 11, and Spark. There is no need to download these components. These off-line installation components are located in $(pwd)/package/install-packages .

List of off-line installed packages

One of the files extracted from the tarball is setup.sh. This script installs Collibra DQ and the required components. If a component already exist (for example, Java 8 is already installed and $JAVA_HOME is set), then that component is not installed (i.e. Java 8 installation is skipped).

To control which components are installed, use the -options=... parameter. The argument provided should be a comma-delimited list of components to install (valid inputs: spark, postgres, owlweb, and owlagent. -options=postgres,spark,owlweb,owlagent means "install PostgreSQL, Spark pseudo cluster, DQ Web Application, and DQ Agent". Note that Java is not part of the options. Java 8 or Java 11 installation is automatically checked and installed/skipped depending on availability.

At a minimum, you must specify -options=spark,owlweb,owlagent if you independently installed PostgreSQL or use an external PostgreSQL connection (as shown in Step 4, if you choose that installation route).

Copy
### The following installs PostgresDB locally as part of Collibra DQ install

./setup.sh \
    -owlbase=$OWL_BASE \
    -user=$OWL_METASTORE_USER \
    -pgpassword=$OWL_METASTORE_PASS \
    -options=postgres,spark,owlweb,owlagent

Warning The $ symbol is not a supported special character in your PostgreSQL Metastore password.

Note If you are prompted to install Java 8 or Java 11 because you do not have one of them installed, accept to install from a local package.

You are prompted to select a location to install PostgreSQL, as shown below:

Copy
Postgres DB needs to be intialized. Default location = <OWL_BASE>/postgres/data
to change path please enter a FULL valid path for Postgres and hit <enter>
DB Path [ <OWL_BASE>/owl/postgres/data ] = 

If the data files for the PostgreSQL database need to be hosted at a specific location, provide it during this prompt. Ensure the directory is writable. Otherwise, press <Enter> to install the data files into $OWL_BASE/owl/postgres/data. The default suggested path does not have permission issues if you use OWL_BASE as the home directory.

If no exceptions occur and the installation is successful, then the process completes with the following output:

Copy
installing owlweb
starting owlweb
starting owl-web
installing agent
not starting agent
install complete
please use owl owlmanage utility to configure license key and start owl-agent after owl-web successfully starts up

GOAL

  1. Specify OWL_BASE path where Collibra DQ will be installed and specify PostgreSQL environment variables.
  2. Install DQ Web with PostgreSQL and Spark linked to DQ Agent (all files are in the $OWL_BASE/owl sub-directory) using setup.sh script provided. The location of OWL_BASE and PostgreSQL are configurable, but we advise you take the defaults.

4. Install DQ + Spark and use existing PostgreSQL (advanced)

Note  Skip Step 4 if you opted to install PostgreSQL and performed Step 3 instead.

We recommend Step 4 over Step 2 for advanced DQ Installer.

If you already installed DQ in the previous step, then skip this step. This is only for those who want to use external PostgreSQL (e.g. use GCP Cloud SQL service as the PostgreSQL metadata storage). If you have an existing PostgreSQL installation, then everything in the previous step applies except the PostgreSQL data path prompt and the setup.sh command.

Refer to Step 3 for details on OWL_BASE, OWL_METASTORE_USER , and OWL_METASTORE_PASS.

Copy
# The base path that you want Collibra DQ installed. No trailing

export OWL_BASE=$(pwd)
export OWL_METASTORE_USER=postgres

# minimum complexity recommended (18 length, upper, lower, number, symbol)
# example below
export OWL_METASTORE_PASS=H55Mt5EbXh1a%$aiX6

Run the following installation script. Note the missing "Postgres" in -options and new parameter -pgserver. This -pgserver could point to any URL that the standalone instance has access to.

Copy
# The following does not install PostgresDB and 
# uses existing PostgresDB server located in localhost:5432 with "postgres" database
./setup.sh \
    -owlbase=$OWL_BASE \
    -user=$OWL_METASTORE_USER \
    -pgpassword=$OWL_METASTORE_PASS \
    -options=spark,owlweb,owlagent \
    -pgserver="localhost:5432/postgres"

The database named postgres is used as the default DQ metadata storage. Changing this database name is out-of-scope for Full Standalone Installation. Contact the Collibra DQ team for assistance.

GOAL

  1. Specify OWL_BASE path where Collibra DQ will be installed and specify PostgreSQL environment variables
  2. Install DQ Web and Spark linked to DQ Agent (all files will be in $OWL_BASE/owl sub-directory) using setup.sh script provided and link DQ Web to an existing PostgreSQL server.

5. Verify DQ and Spark Installation

The installation process will start the DQ Web Application. This process initializes the PostgreSQL metadata storage schema in PostgreSQL under the database named postgres. This process must complete successfully before the DQ Agent can be started. Wait approximately 1 minute for the PostgreSQL metadata storage schema to be populated. If you can access DQ Web using <url-to-dq-web>:9000 using a Web browser, then this means you have successfully installed DQ.

Login page of DQ Web UI

Next, verify that the Spark Cluster has started and is available to run DQ checks using <url-to-dq-web>:8080. Take note of the Spark Master URL starting with spark://.... This is required during DQ Agent configuration.

Fig 3: Spark Master Web UI

6. Set License Key

In order for Collibra DQ to run checks on data, the DQ Agent must be configured with a license key. Replace <license-key> with a valid license key provided by Collibra.

Note Your license key is the value following YOUR KEY IS = in the license provision email.

Copy
cd $OWL_BASE/owl/bin
./owlmanage.sh setlic=<license-key>

# expected output:
# > License Accepted new date: <expiration-date>

7. Set License Name

It is required that you set a license name upon your initial deployment of Collibra DQ.

Replace <your-license-name> with a valid license name provided by Collibra.

Note Your license name is the value following YOUR NAME IS = in the license provision email.

Copy
vi /<install-dir>/owl/config/owl-env.sh
export LICENSE_NAME=<your-license-name>

8. Set DQ Agent Configuration

Start the DQ Agent process to enable processing of DQ checks.

Copy
# 1 Start the agent to create the agent.properties file.
cd $OWL_BASE/owl/bin
./owlmanage.sh start=owlagent

# 2 Stop the agent and add this line to agent.properties:
./owlmanage.sh stop=owlagent

# 3 Add this line to agent.properties:
sparksubmitmode=native
sparkhome=</your/spark/home/folder> 

# 4 Start the agent again.
./owlmanage.sh start=owlagent

# 5 Verify that agent.properties contains the correct details.
cd $OWL_BASE/owl/config
cat $OWL_BASE/owl/config/agent.properties

When the script successfully runs, the $OWL_BASE/owl/config folder contains a file called agent.properties. This file contains agent ID and the number of agents installed in your machine. Since this is the first non-default agent installed, the expected agent ID is 2. Verify that the agent.properties file is created. Youragent.properties is expected to have a different timestamp, but you should see agentid=2.

Copy
cd $OWL_BASE/owl/config
cat agent.properties

# expected output:
> #Tue Jul 13 22:26:19 UTC 2021
> agentid=2

Once the DQ Agent starts, it needs to be configured in DQ Web in order to successfully submit jobs to the local Spark (pseudo) cluster.

The new agent is set up with the template base path /opt and install path /opt/owl. The owlmanage.sh start=owlagent script does not respect OWL_BASE environment. You must edit the Agent Configuration to follow the OWL_BASE.

Follow the steps in the Agent section to configure the newly created DQ Agent and edit the following parameters in DQ Agent #2.

  • Replace all occurrence of /opt/owl with your $OWL_BASE/owl/in Base Path, Collibra DQ Core JAR, Collibra DQ Core Logs, Collibra DQ Script, and Collibra DQ Web Logs.
    • Note that Base Path here does not refer to OWL_BASE
  • Replace Default Master value with the Spark URL from Fig 3
  • Replace Number of Executors(s), Executor Memory (GB), Driver Memory (GB) to a reasonable default based on the size of your instance.

Refer to the Agent section for descriptions of the parameters.

Specify the Number of Core(s).

To limit Spark cores from being used for each job, a common configuration for the Free Form (Appended) field is -conf spark.cores.max=8.

Set the Default Deployment Mode option to Client for a Spark Standalone master.

Fig 4: Expected final output of edited agent based on this tutorial

9. Create DB Connection for DQ job

Follow the steps on Agent to add metastore database connection.

Note In the following examples, a DQ Job is run against local DQ Metadata Storage.

Follow the steps on Agent section to configure newly created DQ Agent.

Click the compass icon in the navigation pane to open the Explorer page. Click the metastore connection, select the public schema, and select the first table in the resulting list of tables. From the Preview and Scope page, click Build Model. When the Profile page populates, click Save/Run.

On the Run page, click Estimate Job, acknowledge the resource recommendations, and click Run.

Click the revolving arrows icon in the left navigation panel to open the Jobs page.

Wait 10 seconds and then click the refresh button above the Status column until the status shows that the DQ job is Finished. We recommend refreshing several times, pausing for a few seconds in between clicks. While a job runs, the Activity column tracks the sequence of activities DQ performs before it completes a job. A successful job shows its status as Finished last.

Troubleshooting + Helpful Commands

Copy
### Setting permissions on your pem file for ssh access

chmod 400 ~/Downloads/ssh_pem_key

Make sure working directory has permissions

For example, if I SSH into the machine with user owldq and use my default home directory location /home/owldq/

Copy
### Ensure appropriate permissions 
### drwxr-xr-x

chmod -R 755 /home/owldq

Reinstall PostgreSQL

Copy
### Postgres data directly initialization failed 
### Postgres permission denied errors
### sed: can't read /home/owldq/owl/postgres/data/postgresql.conf: Permission denied

sudo rm -rf /home/owldq/owl/postgres
chmod -R 755 /home/owldq

### Reinstall just postgres
./setup.sh -owlbase=$OWL_BASE -user=$OWL_METASTORE_USER -pgpassword=$OWL_METASTORE_PASS -options=postgres

Changing PostgreSQL password from SSH

Copy
### If you need to update your postgres password, you can leverage SSH into the VM
### Connect to your hosted instance of Postgres

sudo -i -u postgres
psql -U postgres
\password
#Enter new password: ### Enter Strong Password
#Enter it again: ### Re-enter Strong Password
\q
exit

Warning The $ symbol is not a supported special character in your PostgreSQL Metastore password.

Permissions for ssh keys when starting Spark

Copy
### Spark standalone permission denied after using ./start-all.sh 

ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Tip If the recommendation above is unsuccessful, use the following commands instead of ./start-all.sh:
./start-master.sh
./start-worker.sh spark://$(hostname -f):7077

Permissions if log files are not writable

Copy
### Changing permissiongs on individual log files 

sudo chmod 777 /home/owldq/owl/pids/owl-agent.pid
sudo chmod 777 /home/owldq/owl/pids/owl-web.pid

Getting the hostname of the instance

Copy
### Getting the hostname of the instance

hostname -f

Increase Thread pool / Thread Pool Exhausted

Adding to the owl-env.sh script

Copy
# vi owl-env.sh
# modify these lines

export SPRING_DATASOURCE_POOL_MAX_WAIT=500
export SPRING_DATASOURCE_POOL_MAX_SIZE=30
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=5

# restart web and agent

Adding to the Spark agent configurations

If you see the following message, update the agent configurations within load-spark-env.sh:

Copy
Failed to obtain JDBC Connection; nested exception is org.apache.tomcat.jdbc.pool.PoolExhaustedException: [pool-29-thread-2] Timeout: Pool empty. Unable to fetch a connection in 0 seconds, none available[size:2; busy:1; idle:0; lastwait:200].

Adjust the following configurations to modify the connection pool available:

Copy
export SPRING_DATASOURCE_POOL_MAX_WAIT=1000
export SPRING_DATASOURCE_POOL_MAX_SIZE=30
export SPRING_DATASOURCE_POOL_INITIAL_SIZE=5

Note The load-spark-env.sh file is located in the $SPARK_HOME/bin folder.

Adding to the owl.properties file

Depending on client vs. cluster mode and cluster type, you may also need to add the following configurations in the owl.properties file:

Copy
spring.datasource.tomcat.initial-size=5
spring.datasource.tomcat.max-active=30
spring.datasource.tomcat.max-wait=1000

Active Database Queries

Copy
select * from pg_stat_activity where state='active'

Too many open files configuration

Copy
### "Too many open files error message"
### check and modify that limits.conf file 
### Do this on the machine where the agent is running for Spark standalone version

ulimit -Ha 
cat /etc/security/limits.conf 

### Edit the limits.conf file
sudo vi /etc/security/limits.conf

### Increase the limit for example 
### Add these 3 lines 
fs.file-max=500000
*               soft    nofile           58192
*               hard    nofile           100000
### do not comment out the 3 lines (no '#'s in the 3 lines above)

Redirect Spark scratch

Copy
### Redirect Spark scratch to another location
SPARK_LOCAL_DIRS=/mnt/disks/sdb/tmp

Alternatively, you can add the following to the Free form (Appended) field on the Agent Configuration page to change Spark storage: -conf spark.local.dir=/home/owldq/owl/owltmp

Automate cleanup of Spark work folders

You can add the following line of code to owl/spark/conf/spark-env.sh, which must be copied from the spark-env.sh.template file, or to the bottom of owl/spark/bin/load-spark-env.sh.

Copy
### Set Spark to delete older files
export SPARK_WORKER_OPTS="${SPARK_WORKER_OPTS} -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1800 -Dspark.worker.cleanup.appDataTtl=3600"

Check disk space in the Spark work folder

Copy
### Check worker nodes disk space 
sudo du -ah | sort -hr | head -5

Delete files in the Spark work folder

Copy
### Delete any files in the Spark work directory
sudo find /home/owldq/owl/spark/work/* -mtime +1 -type f -delete

Tip: Add Spark Home Environment Variables to Profile

Copy
### Adding ENV variables to bash profile

### Variable 'owldq' below should be updated wherever installed e.g. centos

vi ~/.bash_profile
export SPARK_HOME=/home/owldq/owl/spark
export PATH=$SPARK_HOME/bin:$PATH

### Add to owl-env.sh for standalone install 

vi /home/owldq/owl/config/owl-env.sh 
export SPARK_HOME=/home/owldq/owl/spark
export PATH=$SPARK_HOME/bin:$PATH

Check Processes are Running

Copy
### Checking PIDS for different components

ps -aef|grep postgres
ps -aef|grep owl-web
ps -aef|grep owl-agent
ps -aef|grep spark

Starting Components

Copy
### Restart different components 

cd /home/owldq/owl/bin/
./owlmanage.sh start=postgres
./owlmanage.sh start=owlagent
./owlmanage.sh start=owlweb

cd /home/owldq/owl/spark/sbin/
./stop-all.sh
./start-all.sh

Configuration Options

Setup.sh arguments

Argument Description
-non-interactive Skips asking to accept Java license agreement.
-skipSpark Skips the extraction of Spark components.
-stop Do not automatically start all components (Owl-Web, Zeppelin, Postgres).
-port= Set DQ Web application to use the defined port.
-user= Optional parameter to set the user to run Collibra DQ. The default is the current user.
-owlbase= Sets the base path to where you want Collibra DQ installed.
-owlpackage= Optional parameter to set the Collibra DQ package directory. The default is the current working directory.
-help Display this help and exit.
-options= The different Collibra DQ components to install in a comma-separated list format. For example, -options=owlagent,owlweb,postgres,spark
-pgpassword= The password used to set for the PostgreSQL metastore. For unattended installs.
-pgserver= The name of the PostgreSQL server. For example, -pgserver=owl-postgres-host.example.com:5432/owldb. For unattended installs.

Example:

./setup.sh -port=9000 -user=ec2-user -owlbase=/home/ec2-user -owlpackage=/home/ec2-user/packages

  • The tar ball extracted to this folder on my EC2 Instance: **** /home/ec2-user/packages/
  • Collibra DQ is running as the **** ec2-user
  • The DQ Web application runs on port 9000
  • The base location for the setup.sh script to create the will be: /home/ec2-user/

Example installing just the agent (perhaps on an Edge node of a hadoop cluster):

./setup.sh -user=ec2-user -owlbase=/home/ec2-user -owlpackage=/home/ec2-user/package -options=owlagent

  • The package extracted to this folder on my EC2 Instance: **** /home/ec2-user/packages/
  • Owl-agent is running as the **** ec2-user
  • The base location for the setup.sh script to create the Collibra DQ folder and place all packages under Collibra DQ is: /home/ec2-user/

When installing different features, the following questions are asked:

  • postgres = Postgres DBPassword needs to be supplied.
  • If postgres is not being installed (such as agent install only) postgres metastore server name needs to be supplied.

Launching and Administering Collibra DQ

When the setup.sh script finishes by default software is automatically started. The setup.sh also creates the owlmanage.sh script which allows for stopping and starting of all owl services or some components of services. The setup script will also generate an owl-env.sh script that will hold the main variables that are reused across components (see owl-env.sh under the config directory).

Collibra DQ Directory Structure after running setup.sh

Configuration of ENV settings within owl-env.sh

Contents of the owl-env.sh script and what the script is used for.

owl-env.sh Scripts Description
export SPARK_CONF_DIR="/home/collibra/owl/cdh-spark-conf" The directory on your machine where the Spark conf directory resides.
export INSTALL_PATH="/home/collibra/owl" The installation directory of Collibra DQ.
export JAVA_HOME="/home/collibra/jdk1.8.0_131" Java Home for Collibra DQ to leverage.
export LOG_PATH="/home/collibra/owl/log" The log path.
export BASE_PATH="/home/collibra" The base location under which the Collibra DQ directory resides.
export SPARK_MAJOR_VERSION=2 Spark Major version. Collibra DQ only supports 2+ version of Spark.
export OWL_LIBS="/home/collibra/owl/libs" Lib Directory to inject in spark-submit jobs.
export USE_LIBS=0

Use the lib directory or not. 0 is the default.

A value of 0 means the lib directory is not used.

A value of 1 means the lib directory is used.

export SPARKSUBMITEXE="spark-submit" Spark submit executable command. Collibra DQ using spark-submit as an example.
export ext_pass_manage=0 If using a password management system. You can enable for password to be pulled from it.

A value of 0 disables an external password management system.
A value of 1 enables an external password management system.
export ext_pass_script="/opt/owl/bin/getpassword.sh" Leverage password script to execute a get password script from the vault.
TIMEOUT=900 #15 minutes in seconds Owl-Web user time out limits.
PORT=9003 #owl-web port NUMBER The default port to use for owl-web.
SPRING_LOG_LEVEL=ERROR The logging level to be displayed in the owl-web.log
SPRING_DATASOURCE_DRIVER_CLASS_NAME=org.postgresql.Driver The driver class name for postgres metastore (used by web).
export SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/postgres JDBC connection string to Collibra DQ Postgres metastore.
export SPRING_DATASOURCE_USERNAME=collibra Collibra DQ PostgreSQL username.
export SPRING_DATASOURCE_PASSWORD=3+017wfY1l1vmsvGYAyUcw5zGL Collibra DQ PostgreSQL password.
export AUTOCLEAN=TRUE/FALSE TRUE/FALSE Enable/Disable automatically delete old datasets.
export DATASETS_PER_ROW=200000 Delete datasets after this threshold is hit (must be greater than the default to change).
export ROW_COUNT_THRESHOLD=300000 Delete rows after this threshold is hit (must be greater than the default to change).
export SERVER_HTTP_ENABLED=true Enabling HTTP to owl web
export OWL_ENC=OFF #JAVA for java encryption Enable Encryption (NOTE need to add to owl.properties also). Has to be in form owl.enc=OFF within owl.properties file to disable, and in this form owl.enc=JAVA to enable. the owl.properties file is located in the owl install path /config folder (/opt/owl/config).
PGDBPATH=/home/collibra/owl/owl-postgres/bin/data Path for PostgreSQL DB
export RUNS_THRESHOLD=5000 Delete runs after this threshold is hit (must be greater than the default to change).
export HTTP_SECONDARY_PORT=9001 Secondary HTTP port to use which is useful when SSL is enabled.
export SERVER_PORT=9000 Same as PORT.
export SERVER_HTTPS_ENABLED=true Enabling of SSL.
export SERVER_SSL_KEY_TYPE=PKCS12 Certificate trust store.
export SERVER_SSL_KEY_PASS=t2lMFWEHsQha3QaWnNaR8ALaFPH15Mh9 Certificate key password.
export SERVER_SSL_KEY_ALIAS=owl Certificate key alias.
export SERVER_REQUIRE_SSL=true Override HTTP on and force HTTPS regardless of HTTP settings.
export MULTITENANTMODE=FALSE Flipping to TRUE will enable multi tenant support.
export multiTenantSchemaHub=owlhub Schema name used for multi tenancy.
export OWL_SPARKLOG_ENABLE=false Enabling deeper spark logs when toggled to true.
export LDAP_GROUP_RESULT_DN_ATTRIBUTE

The attribute to the full path of the group object, for example, CN=OwlAppAdmin,OU=OwlGroups,OU=Groups,DC=owl, DC=com.

Default is distinguishedname.

export LDAP_GROUP_RESULT_NAME_ATTRIBUTE

The attribute to the simple name of the group, for example, OwlAppAdmin.

Default is CN.

export LDAP_GROUP_RESULT_CONTAINER_BASE Property used in the scenario where the LDAP_GROUP_RESULT_DN_ATTRIBUTE does not return a value. In this case, the LDAP_GROUP_RESULT_NAME_ATTRIBUTE prepends to this value, which creates a fully qualified LDAP path. For example, OU=OwlGroups,OU=Groups,DC=owl,DC=com. Default is <null>.

Configuration of owl.properties file

Example Description
key=XXXXXX The license key.
spring.datasource.url=jdbc:postgresql://localhost:5432/postgres The connection string to the Collibra DQ metastore (used by owl-core).
spring.datasource.password=xxxxxx The password to the Collibra DQ metastore (used by owl-core).
spring.datasource.username=xxxxxx The username to the Collibra DQ metastore (used by owl-core).
spring.datasource.driver-class-name=com.owl.org.postgresql.Driver Shaded PostgreSQL driver class name.
spring.agent.datasource.url jdbc:postgresql://$host:$port/owltrunk
spring.agent.datasource.username {user}
spring.agent.datasource.passwords {password}
spring.agent.datasource.driver-class-name org.postgresql.Driver

Starting Spark

Spark Standalone Mode - Spark 3.2.0 Documentation

Launch Scripts

To launch a Spark standalone cluster with the launch scripts, you should create a file called conf/workers in your Spark directory, which must contain the hostnames of all the machines where you intend to start Spark workers, one per line. If conf/workers does not exist, the launch scripts defaults to a single machine (localhost), which is useful for testing. Note, the master machine accesses each of the worker machines via ssh. By default, ssh is run in parallel and requires password-less (using a private key) access to be setup. If you do not have a password-less setup, you can set the environment variable SPARK_SSH_FOREGROUND and serially provide a password for each worker.

Once you’ve set up this file, you can launch or stop your cluster with the following shell scripts, based on Hadoop’s deploy scripts, and available in SPARK_HOME/sbin:

Shell Script Description
sbin/start-master.sh Starts a master instance on the machine the script is executed on.
sbin/start-workers.sh Starts a worker instance on each machine specified in the conf/workers file.
sbin/start-worker.sh Starts a worker instance on the machine the script is executed on.
sbin/start-all.sh Starts both a master and a number of workers as described above.
sbin/stop-master.sh Stops the master that was started via the sbin/start-master.sh script.
sbin/stop-worker.sh Stops all worker instances on the machine the script is executed on.
sbin/stop-workers.sh Stops all worker instances on the machines specified in the conf/workers file.
sbin/stop-all.sh Stops both the master and the workers as described above.

Note  These scripts must be executed on the machine you want to run the Spark master on, not your local machine.

Copy
### Starting Spark Standalone 

cd /home/owldq/owl/spark/sbin
./start-all.sh

### Stopping Spark 
cd /home/owldq/owl/spark/sbin
./stop-all.sh
Copy
### Starting Spark with Separate Workers

SPARK_WORKER_OPTS=" -Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=1799 -Dspark.worker.cleanup.appDataTtl=3600"

### 1 start master
/home/owldq/owl/spark/sbin/start-master.sh

### 2 start workers 
SPARK_WORKER_INSTANCES=3;/home/owldq/owl/spark/sbin/start-slave.sh spark://$(hostname):7077 -c 5 -m 20g